A probabilistic approach to sequence assembly validation

نویسندگان

  • Sun Kim
  • Li Liao
  • Jean-François Tomb
چکیده

ABSTRACT Sequence assembly is an essential requirement for determining the complete sequence of long DNA. However, sequence assembly programs often generate misassembled contigs by either joining di erent repeat copies, resulting in joining non contiguous DNA regions (inverted or swapped) or by including many fragments from di erent repeat copies resulting in errors in the consensus sequence (noisy regions). Usually, sequence assemblies are experimentally validated. While this is the most reliable approach, it is time consuming and labor intensive. In this paper, we propose a probabilistic approach to identify possible misassembled regions in shotgun sequence assemblies. Based on the statistics using a set of randomly sampled patterns from shotgun data, a probability model that measures each fragment's contribution to misassembly is proposed. From the probability model, we compute entropy at each base position in contig assembly. Our approach correctly identi ed all misassembled regions in the assembly of the Mycoplasma genitalium genome from real shotgun sequence data. Furthermore, using this approach we identi ed many putative misassembled regions in the assemblies of bacterial genomes we are currently sequencing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-objective Mixed Model Two-sided Assembly Line Sequencing Problem in a Make –To- Order Environment with Customer Order Prioritization

Mixed model two-sided assembly lines (MM2SAL) are applied to assemble large product models, which is produced in high-volume. So, the sequence planning of products to reduce cost and increase productivity in this kind of lines is imperative. The presented problem is tackled in two steps. In step 1, a framework is developed to select and prioritize customer orders under the finite capacity of th...

متن کامل

Assembly line balancing to minimize balancing loss and system loss

Assembly Line production is one of the widely used basic principles in production system. The problem of Assembly Line Balancing deals with the distribution of activities among the workstations so that there will be maximum utilization of human resources and facilities without disturbing the work sequence. Research works reported in the literature mainly deals with minimization of idle time i.e...

متن کامل

Reduction of production disturbances of a shoemaking industry through a discrete event simulation approach

This study presents a reduction of production disturbances of a shoemaking industry through discrete event simulation approach. The study is conducted at Peacock Shoe factory found in Addis Ababa, Ethiopia.  This factory faces line balancing problem that becomes production disturbance for its assembly lines. Detail time study is carried out for the selected shoe model using stopwatch. Assembly ...

متن کامل

A two-stage stochastic rule-based model to determine pre-assembly buffer content

This study considers instant decision-making needs of the automobile manufactures for resequencing vehicles before final assembly (FA). We propose a rule-based two-stage stochastic model to determine the number of spare vehicles that should be kept in the pre-assembly buffer to restore the altered sequence due to paint defects and upstream department constraints. First stage of the model decide...

متن کامل

Modeling the Hybrid Flow Shop Scheduling Problem Followed by an Assembly Stage Considering Aging Effects and Preventive Maintenance Activities

Scheduling problem for the hybrid flow shop scheduling problem (HFSP) followed by an assembly stage considering aging effects additional preventive and maintenance activities is studied in this paper. In this production system, a number of products of different kinds are produced. Each product is assembled with a set of several parts. The first stage is a hybrid flow shop to produce parts. All ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001